Search CORE

45 research outputs found

Supporting Human-AI Collaboration in Auditing LLMs with LLMs

Author: Amershi Saleema
King Nicholas
Rastogi Charvi
Ribeiro Marco Tulio
Publication venue
Publication date: 18/08/2023
Field of study

Large language models are becoming increasingly pervasive and ubiquitous in society via deployment in sociotechnical systems. Yet these language models, be it for classification or generation, have been shown to be biased and behave irresponsibly, causing harm to people at scale. It is crucial to audit these language models rigorously. Existing auditing tools leverage either or both humans and AI to find failures. In this work, we draw upon literature in human-AI collaboration and sensemaking, and conduct interviews with research experts in safe and fair AI, to build upon the auditing tool: AdaTest (Ribeiro and Lundberg, 2022), which is powered by a generative large language model (LLM). Through the design process we highlight the importance of sensemaking and human-AI communication to leverage complementary strengths of humans and generative models in collaborative auditing. To evaluate the effectiveness of the augmented tool, AdaTest++, we conduct user studies with participants auditing two commercial language models: OpenAI's GPT-3 and Azure's sentiment analysis model. Qualitative analysis shows that AdaTest++ effectively leverages human strengths such as schematization, hypothesis formation and testing. Further, with our tool, participants identified a variety of failures modes, covering 26 different topics over 2 tasks, that have been shown before in formal audits and also those previously under-reported.Comment: 21 pages, 3 figure

arXiv.org e-Print Archive

Aligning Offline Metrics and Human Judgments of Value for Code Generation Models

Author: Amershi Saleema
Bansal Gagan
Dibia Victor
Fourney Adam
Liu Han
Poursabzi-Sangdeh Forough
Publication venue
Publication date: 13/06/2023
Field of study

Large language models have demonstrated great potential to assist programmers in generating code. For such human-AI pair programming scenarios, we empirically demonstrate that while generated code is most often evaluated in terms of their functional correctness (i.e., whether generations pass available unit tests), correctness does not fully capture (e.g., may underestimate) the productivity gains these models may provide. Through a user study with N = 49 experienced programmers, we show that while correctness captures high-value generations, programmers still rate code that fails unit tests as valuable if it reduces the overall effort needed to complete a coding task. Finally, we propose a hybrid metric that combines functional correctness and syntactic similarity and show that it achieves a 14% stronger correlation with value and can therefore better represent real-world gains when evaluating and comparing models.Comment: Accepted at ACL 2023 (Findings

arXiv.org e-Print Archive

Recommended from our members

Power to the People: The Role of Humans in Interactive Machine Learning

Author: Amershi Saleema
Cakmak Maya
Knox W. Bradley
Kulesza Todd
Publication venue: American Association for Artificial Intelligence
Publication date
Field of study

Systems that can learn interactively from their end-users are quickly becoming widespread. Until recently, this progress has been fueled mostly by advances in machine learning; however, more and more researchers are realizing the importance of studying users of these systems. In this article we promote this approach and demonstrate how it can result in better user experiences and more effective learning systems. We present a number of case studies that demonstrate how interactivity results in a tight coupling between the system and the user, exemplify ways in which some existing systems fail to account for the user, and explore new ways for learning systems to interact with their users. After giving a glimpse of the progress that has been made thus far, we discuss some of the challenges we face in moving the field forward.This is an author's peer-reviewed final manuscript, as accepted by the publisher. The published article is copyrighted by the American Association for Artificial Intelligence and can be found at: http://www.aaai.org/Magazine/magazine.php

ScholarsArchive@OSU

ICE: Enabling Non-Experts to Build Models Interactively for Large-Scale Lopsided Problems

Author: Aparna Lakshmiratan
Carlos Garcia
David Chickering
David Grangier
Denis Charles
Jina Suh
Johan Verwey
Jurado Suarez
Léon Bottou
Patrice Simard
Saleema Amershi
Publication venue
Publication date: 01/01/2014
Field of study

Quick interaction between a human teacher and a learning machine presents numerous benefits and challenges when working with web-scale data. The human teacher guides the machine towards accomplishing the task of interest. The learning machine leverages big data to find examples that maximize the training value of its interaction with the teacher. When the teacher is restricted to labeling examples selected by the machine, this problem is an instance of active learning. When the teacher can provide additional information to the machine (e.g., suggestions on what examples or predictive features should be used) as the learning task progresses, then the problem becomes one of interactive learning. To accommodate the two-way communication channel needed for efficient interactive learning, the teacher and the machine need an environment that supports an interaction language. The machine can access, process, and summarize more examples than the teacher can see in a lifetime. Based on the machine's output, the teacher can revise the definition of the task or make it more precise. Both the teacher and the machine continuously learn and benefit from the interaction. We have built a platform to (1) produce valuable and deployable models and (2) support research on both the machine learning and user interface challenges of the interactive learning problem. The platform relies on a dedicated, low-latency, distributed, in-memory architecture that allows us to construct web-scale learning machines with quick interaction speed. The purpose of this paper is to describe this architecture and demonstrate how it supports our research efforts. Preliminary results are presented as illustrations of the architecture but are not the primary focus of the paper

arXiv.org e-Print Archive

CiteSeerX

Predicting Academic Success Based on Learning Material Usage

Author: Amershi Saleema
Basak Debasish
Fincher Sally
Minaei-Bidgoli Behrouz
Wiedenbeck Susan
Publication venue: ACM
Publication date: 27/09/2017
Field of study

In this work, we explore students' usage of online learning material as a predictor of academic success. In the context of an introductory programming course, we recorded the amount of time that each element such as a text paragraph or an image was visible on the students' screen. Then, we applied machine learning methods to study to what extent material usage predicts course outcomes. Our results show that the time spent with each paragraph of the online learning material is a moderate predictor of student success even when corrected for student time-on-task, and that the information can be used to identify at-risk students. The predictive performance of the models is dependent on the quantity of data, and the predictions become more accurate as the course progresses. In a broader context, our results indicate that course material usage can be used to predict academic success, and that such data can be collected in-situ with minimal interference to the students' learning process.Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Trust in AutoML: Exploring Information Needs for Establishing Trust in Automated Machine Learning Systems

Author: Amershi Saleema
Carvalho Diogo V
Khurana Udayan
Madsen Maria
Mao Yaoli
Olson Randal S
Siau Keng
Tavakol Mohsen
Wang Dakuo
Zhao Ruijing
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/01/2020
Field of study

We explore trust in a relatively new area of data science: Automated Machine Learning (AutoML). In AutoML, AI methods are used to generate and optimize machine learning models by automatically engineering features, selecting models, and optimizing hyperparameters. In this paper, we seek to understand what kinds of information influence data scientists' trust in the models produced by AutoML? We operationalize trust as a willingness to deploy a model produced using automated methods. We report results from three studies -- qualitative interviews, a controlled experiment, and a card-sorting task -- to understand the information needs of data scientists for establishing trust in AutoML systems. We find that including transparency features in an AutoML tool increased user trust and understandability in the tool; and out of all proposed features, model performance metrics and visualizations are the most important information to data scientists when establishing their trust with an AutoML tool.Comment: IUI 202

arXiv.org e-Print Archive

Crossref

Researching AI Legibility Through Design

Author: Amershi Saleema
Amershi Saleema
Arnold Matthew
Arulkumaran Kai
Binder Thomas
Bostrom Nick
Cave Stephen
Day One CES
Durrant Abigail C.
Ferreira Jennifer
Ferreira Jennifer
Gill Karamjit S.
Gogeun Joseph
Hayes P
Holland Sarah
Kelley Patrick Gage
Marcus Aaron
Marcus Aaron
Morozov Evgeny
Patel Kayur
Sebeok Thomas Albert
Silverstone Roger
Stappers Pieter
Ugail Hassan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/04/2020
Field of study

Everyday interactions with computers are increasingly likely to involve elements of Artificial Intelligence (AI). Encompassing a broad spectrum of technologies and applications, AI poses many challenges for HCI and design. One such challenge is the need to make AI’s role in a given system legible to the user in a meaningful way. In this paper we employ a Research through Design (RtD) approach to explore how this might be achieved. Building on contemporary concerns and a thorough exploration of related research, our RtD process reflects on designing imagery intended to help increase AI legibility for users. The paper makes three contributions. First, we thoroughly explore prior research in order to critically unpack the AI legibility problem space. Second, we respond with design proposals whose aim is to enhance the legibility, to users, of systems using AI. Third, we explore the role of design-led enquiry as a tool for critically exploring the intersection between HCI and AI research

Crossref

E-space: Manchester Metropolitan University's Research Repository

Lancaster E-Prints

Emerging Perspectives in Human-Centered Machine Learning

Author: Amershi Saleema
Banks Richard
Bansal Gagan
Fiebrink Rebecca
Ghorashi Soroush
Meek Christoper
Ramos Gonzalo
Smith-Renner Alison
Suh Jina
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/05/2019
Field of study

Current Machine Learning (ML) models can make predictions that are as good as or better than those made by people. The rapid adoption of this technology puts it at the forefront of systems that impact the lives of many, yet the consequences of this adoption are not fully understood. Therefore, work at the intersection of people's needs and ML systems is more relevant than ever. This area of work, dubbed Human-Centered Machine Learning (HCML), re-thinks ML research and systems in terms of human goals. HCML gathers an interdisciplinary group of HCI and ML practitioners, each bringing their unique, yet related perspectives. This one-day workshop is a successor of Gillies et al. (2016) and focuses on recent advancements and emerging areas in HCML. We aim to discuss different perspectives on these areas and articulate a coordinated research agenda for the XXI century

Goldsmiths Research Online

UAL Research Online

Human-Centered Machine Learning

Author: Amershi Saleema
Bevilacqua Frédéric
Caramiaux Baptiste
d’Alessandro Nicolas
Fiebrink Rebecca
Garcia Jérémie
Gillies Marco
Heloir Alexis
Kulesza Todd
Lee Bongshin
Mackay Wendy
Nunnari Fabrizio
Tanaka Atau
Tilmanne Joëlle
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/05/2016
Field of study

Machine learning is one of the most important and successful techniques in contemporary computer science. It involves the statistical inference of models (such as classifiers) from data. It is often conceived in a very impersonal way, with algorithms working autonomously on passively collected data. However, this viewpoint hides considerable human work of tuning the algorithms, gathering the data, and even deciding what should be modeled in the first place. Examining machine learning from a human-centered perspective includes explicitly recognising this human work, as well as reframing machine learning workflows based on situated human working practices, and exploring the co-adaptation of humans and systems. A human-centered understanding of machine learning in human context can lead not only to more usable machine learning tools, but to new ways of framing learning computationally. This workshop will bring together researchers to discuss these issues and suggest future research questions aimed at creating a human-centered approach to machine learning

HAL-CentraleSupelec

Goldsmiths Research Online

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Crowdsourcing the Perception of Machine Teaching

Author: Amershi Saleema
Angwin Julia
Barocas Solon
Bengio Yoshua
Bragg Danielle
Buhrmester Michael
Campolo Alex
Commision European
Fails Jerry Alan
Gunning David
Hara Kotaro
Hong Jonggi
Kacorri Hernisa
Khan Faisal
Lee Kyungjun
Litman Leib
Narayanan Arvind
Patel Rupal
Shneiderman Ben
Simard Patrice Y.
Simons Daniel J.
Stewart Neil
Suchman Lucy A
Szegedy C.
US
Vaughan Jennifer Wortman
Wang Danding
Weld Daniel S
Zhu Xiaojin
Zimmermann-Niefield Abigail
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/02/2020
Field of study

Teachable interfaces can empower end-users to attune machine learning systems to their idiosyncratic characteristics and environment by explicitly providing pertinent training examples. While facilitating control, their effectiveness can be hindered by the lack of expertise or misconceptions. We investigate how users may conceptualize, experience, and reflect on their engagement in machine teaching by deploying a mobile teachable testbed in Amazon Mechanical Turk. Using a performance-based payment scheme, Mechanical Turkers (N = 100) are called to train, test, and re-train a robust recognition model in real-time with a few snapshots taken in their environment. We find that participants incorporate diversity in their examples drawing from parallels to how humans recognize objects independent of size, viewpoint, location, and illumination. Many of their misconceptions relate to consistency and model capabilities for reasoning. With limited variation and edge cases in testing, the majority of them do not change strategies on a second training attempt.Comment: 10 pages, 8 figures, 5 tables, CHI2020 conferenc

arXiv.org e-Print Archive

Crossref